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Abstract 

The nuclear genomes of euglenids contain three types of introns: conventional spliceosomal introns, nonconventional 
introns for which a splicing mechanism is unknown (variable noncanonical borders, RNA secondary structure bringing 
together intron ends), and so-called intermediate introns, which combine features of conventional and nonconventional 
introns. Analysis of two genes, tubA and tubB, from 20 species of euglenids reveals contrasting distribution patterns of 
conventional and nonconventional introns — positions of conventional introns are conserved, whereas those of the 
nonconventional ones are unique to individual species or small groups of closely related taxa. Moreover, in the group 
of phototrophic euglenids, 11 events of conventional intron loss versus IS events of nonconventional intron gain were 
identified. A comparison of all nonconventional intron sequences highlighted the most conserved elements in their 
sequence and secondary structure. Our results led us to put forward two hypotheses. 1) The first one posits that 
mutational changes in intron sequence could lead to a change in their excision mechanism — intermediate introns 
would then be a transitional form between the conventional and nonconventional introns. 2) The second hypothesis 
concerns the origin of nonconventional introns — because of the presence of inverted repeats near their ends, insertion of 
MITE'like transposon elements is proposed as a possible source of new introns. 
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Introduction 

Euglenids (Euglenida) together with heterotrophic flagellates 
diplonemids (Diplonemea), symbiontids (Symbiontida), 
and kinetoplastids (Kinetoplastea) form an ancient group 
Euglenozoa within the supergroup Excavata (HampI et al. 
2009; AdI et al. 2012). Some reports, however, indicate the 
Euglenozoa as the first group to branch off from the main 
evolutionary lineage of eukaryotes (Cavalier-Smith 2010). 
No process of sexual reproduction or any other process of 
exchanging genetic material has been observed in euglenids 
so far. 

The phylogeny of euglenids is under intensive study in 
recent years, and reliable and consistent phylogenetic trees 
describing the evolution within this group have been ob- 
tained (Linton et al. 1999; Marin et al. 2003; Milanowski 
et al. 2006; Triemer et al. 2006; Linton et al. 2010; 
Yamaguchi et al. 2012). They led to the conclusion that all 
phototrophic euglenids form a monophyletic lineage. Their 
common ancestor (a heterotrophic euglenid) acquired the 
chloroplast by secondary endosymbiosis with green algae, 
probably from Pyramimonas genus (Prasinophyceae; 
Turmel et al. 2009; Hrda et al. 2012). The best established 
are phylogenetic relationships within phototrophic euglenids, 
which are grouped in 12 genera. 

Our understanding of the organization of the genetic ma- 
terial in euglenids is limited. Their chloroplast genomes are 



the best described: the complete plastid genomes of Euglena 
gracilis (Hallick et al. 1993), £ longa (Gockel and Hachtel 
2000), Eutreptiella gymnastica (Hrda et al. 2012), Eutreptia 
viridis (Wiegert et al. 2012), Monomorphino oenigmatico 
(Pombert et al. 2012), Colocium vesiculosum, Stmmbomonas 
costata (Wiegert et al. 2013), and £. midis (Bennett et al. 2012) 
are known. Very little is known about mitochondrial genomes 
of euglenids, but their structure is probably unique. Earlier 
studies indicated that the mitochondrial genome of £ gracilis 
comprises numerous, short and long, circular and linear DNA 
molecules (Roy et al. 2007). The organization of the nuclear 
genetic material in euglenids is also unclear. Surprisingly, even 
the number of chromosomes in the best-examined species 
£ gracilis is uncertain (Dooijes et al. 2000). Currently, the £ 
gracilis genome sequencing project is in progress: its size is 
unexpectedly large and is estimated at about 250,000,000 bp 
(Goldstamp: Gi07537). Several unusual features of its organi- 
zation are also observed: rRNA genes are located on the ex- 
trachromosomal, circular molecules present in the cell in 
hundreds to thousands of copies (Cook and Roxby 1985; 
Ravel-Chapuis 1988); there is no single full-length 28S 
rRNA — instead, 13 short RNA molecules form its equivalent 
(Schnare and Gray 1990). Moreover, in euglenids, a splice 
leader is transferred to the 5^ end of most pre-mRNAs in 
the process of spliceosome-dependent trans-splicing 
(Ebel et al. 1999). 
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An unprecedented feature of euglenid genomes is the 
presence of three types of introns: 1) conventional spliceoso- 
mal introns with canonical GT/C-AG borders, 2) nonconven- 
tional introns for which a splicing mechanism is unknown 
(noncanonical and variable borders, RNA secondary structure 
bringing together ends of the intron), and 3) so-called inter- 
mediate introns. It should be noted here that an analysis of 
complete genomes of trypanosomatids (relatives of eugle- 
nids) has revealed very few introns. In these parasitic flagel- 
lates, only two genes — those encoding tRNA (Tyr) (Schneider 
et al. 1994) and poly(A) polymerase (Mair et al. 2000)— are 
disrupted by introns, whereas in £. gracilis most examined 
genes contain introns. Conventional spliceosomal introns 
were found in genes encoding fibrillarin (Breckenridge et al. 
1999; Russell et al. 2005), a-, (3-, and y-tubulin (Canaday et al. 
2001), and in genes encoding proteins targeted to chloro- 
plasts: eno29, gopA, pbgD, petA, psaF, psbM, and psbW 
(Vesteg et al. 2010). The presence of conventional introns 
has also been confirmed in the heterotrophic euglenids 
Enthosiphon sulcatum (tubB; Ebel et al. 1999) and Peranema 
trichophorum (hsp90; Breglia et al. 2007). Euglena gracilis 
genes also contain so-called nonconventional introns, which 
do not have GT/C-AG borders (nor AT-AC borders charac- 
teristic of some minor spliceosomal introns). However, very 
little is known about this type of introns — neither the mech- 
anism of their removal nor any factors involved have been 
recognized. They are apparently removed by a spliceosome- 
free mechanism, because their 5^ ends are not complemen- 
tary to the U1 snRNA (Breckenridge et al. 1999). It has only 
been noted that all introns of this type form a stable RNA 
secondary structure bringing the two splice sites together, 
which is probably needed for their proper removal. This struc- 
ture, however, is not conserved and shows no common fea- 
tures with the self-splicing group I, II, or ill introns (Muchhal 
and Schwartzbach 1994; Henze et al. 1995; Tessier et al. 1995; 
Canaday et al. 2001). It also seems that the nonconventional 
introns are excised before the addition of the spliced leader at 
the 5^ end of nuclear pre-mRNAs (trans-splicing) and before 
the excision of conventional spliceosomal introns (Tessier 
et al. 1995). Initially, introns of this type were only observed 
in nuclear genes of chloroplast origin (Ihcpl, rbcS; Muchhal 
and Schwartzbach 1994; Tessier et al. 1995) and in the gapC 
gene of eubacterial origin (Henze et al. 1995). On this basis it 
was hypothesized that nonconventional introns were derived 
from the genome of a secondary endosymbiont (Ebel et al. 
1999). Later, however, nonconventional introns were also 
found in £ gracilis genes encoding a- and (3-tubulin 
(Canaday et al. 2001), which derived from the genome of 
the host cell. A comparison of intron positions in the tubulin 
genes of diverse species suggested that most of the conven- 
tional introns were evolutionarily old, as they occurred in the 
same positions in other organisms, whereas the nonconven- 
tional introns were present in unique positions, suggesting 
that they were evolutionarily younger than the spliceosomal 
ones (Canaday et al. 2001). In 2007, the presence of noncon- 
ventional introns, besides conventional ones, was revealed in 
the hsp90 gene from the primarily heterotrophic euglenid 
species P. trichophorum (Breglia et al. 2007), which put into 



question the earlier hypothesis on an endosymbiotic origin of 
the nonconventional introns. Recently, nonconventional in- 
trons were also found in two genes encoding p last id-targeted 
proteins pet] and psbW from £ gracilis (Vesteg et al. 2010). 
Some authors also distinguish a third type of nuclear introns 
in £ gracilis, so-called "intermediate" introns, as observed in 
the genes encoding a- and (3-tubulin (Canaday et al. 2001) 
and fibrillarin (Russell et al. 2005). They form a stable second- 
ary structure bringing together the ends of the introns, one or 
even both intron borders are consistent with the GT/C-AG 
rule, and the 5^ end of the intron is to some extent comple- 
mentary to the U1 snRNA (however, the complementarity is 
much weaker than that of conventional introns). 

To date, there are no data about the evolution of introns in 
euglenids, and little is known about the distribution of introns 
in nuclear genes in the context of the group's phylogeny. It is 
thus impossible to answer basic questions such as whether 
the conventional, nonconventional, and intermediate introns 
occur in conserved positions in different evolutionary lineages 
or whether introns of one type can be replaced by introns of 
another type. Answering these questions seems to be crucial 
for addressing more complex issues, such as the mechanism 
of excision of nonconventional introns. These answers should 
also help to understand the role of nonconventional introns 
in the functioning and evolution of euglenid genomes; a 
recent study of genes encoding chlorop last-targeted proteins 
suggests that both conventional and nonconventional 
introns may be hot spots of DNA recombination. It was pro- 
posed that this mechanism leads to the replacement or 
acquisition of plastid-targeting leader sequences (Vesteg 
et al. 2010). 

As reviewed above, understanding the distribution of con- 
ventional, nonconventional, and intermediate introns may 
be key to resolving basic issues concerning the evolution of 
euglenids and their genomes. Here, we report an analysis of 
intron distribution in two nuclear genes, tubA and tubB, from 
20 species (18 phototrophic or secondary heterotrophic and 2 
primary heterotrophic) representing 13 genera of euglenids. 

Results and Discussion 

Types of Introns in Tubulin Genes 
Using nested polymerase chain reaction (PGR) amplification 
with degenerate primers on genomic DNA, 30 tubA and 39 
tubB sequences were obtained, encompassing 78% and 86% 
of coding regions, respectively. More than one form of the 
tubA gene was obtained for 8 and of the tubB gene for 11 
species; the differences between the forms from the same 
species were minor and mostly concerned sequence differ- 
ences within introns and the third position of codons. 
To determine the intron positions, the genomic sequences 
were compared with those of cDNAs obtained by reverse 
transcription of mRNA. Introns were absent in both genes 
from two representatives of Eutreptiales, Eutreptia viridis and 
Eut. pomquetensis, whereas for another representative of 
Eutreptiales, Eut. braarudii, and for the primary heterotroph 
Er]t sulcatum, no introns were found in the tubA gene. In the 
remaining genes between one and five introns were present. 
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Fig. 1. Distribution and postulated evolution of introns in tubA and tubB genes of euglenids. Left: schematic phylogenetic trees reflecting phylogeny of 
euglenids (Busse et al. 2003; Marin et al. 2003; Linton et al. 2010). Species names of primary heterotrophs in gray, phototrophs and secondary 
heterotrophs in black. Predicted intron presence in common ancestors ( + ) and postulated events of intron gain {\) and loss (x) are indicated on 
branches (alternative scenarios in gray). Right: distribution of introns in tubA and tubB genes. Circles on horizontal bars next to species names indicate 
introns in the specific positions; black circles: conventional introns; gray: conventional/intermediate; gray and white: intermediate/nonconventional; 
white: nonconventional. 



occupying 10 and 15 unique positions in tubA and tubB, 
respectively. Their locations are depicted schematically in 
figure 1 together with the proposed type and a phylogenetic 
tree of the respective species; supplementary tables SI and S2, 
Supplementary Material online, show sequences of intron 
junctions in the tubulin genes. 

Introns with the consensus border sequence GT/C-AG are 
present in tubA genes in positions indicated in figure 1 as 1 



(in the primary heterotroph Menoidium bibacillatum), 3 
(A/1, bibacillatum and representatives of Euglenales), 4 and 
10 (Euglenales), and 5 (E. gracilis); in tubB genes in positions 
1 and 3 (Euglenales), 2, 7, and 14 (A/l. bibacillatum), and 4 {Ent 
sulcatum). They were classified as conventional spliceosomal 
introns (C), except for introns in positions 4 and 5 in tubA 
genes from Pha. limnophila and £ gracilis, as well as the in- 
tron in position 1 in the tubB gene from £. agilis, for which 



586 



Introns in Tubulin Genes of Euglenids • doi:10.1093/molbev/mst227 



MBE 



stable secondary structure bringing together intron ends was 
predicted — they were classified as conventional/intermediate 
introns (C/l). 

Most introns without the consensus sequence CT/C-AC 
at both ends are present in positions unique to single species 
(tubA: 2, 6, 7, and 8; tubB: S, 6, 10, and 11) or comnnon for 
several closely related species composing well-defined clades 
(tubA: 9; tubB: S, 9, 13, and 15). The exceptions are tubB 
introns in position 12, present in two relatively distant species 
S. costata and Lepocinclis spimgymides; another exception is 
the tubA intron in £ quartam in position 5, where a C/l 
intron is also present in £. gracilis. The £ gracilis intron con- 
tains direct CAG repeats at the intron-exon junctions, and 
four alternative positions can be defined for it; the position 
preserving the GC-AG ends of intron is one nucleotide down- 
stream from the position of the £ quartana intron. 

The introns lacking the consensus GT/C-AG sequences at 
ends were initially classified as nonconventional ones (N); all 
of them can form a stable stem-loop RNA structure bringing 
the splice sites together. As most of them have direct repeats 
at the intron-exon junctions, it is difficult to determine their 
exact positions. To predict the most likely locations, all known 
nonconventional introns lacking repeats, for which the loca- 
tions could be determined unambiguously (14 introns), 
were compared and the nucleotide logo of their junctions 
was created (supplementary fig. SI, Supplementary Material 
online). The secondary structure of these introns was pre- 
dicted as well. The ends of all the nonconventional introns 
in the tubulin genes were then fitted to the consensus 
obtained as above (supplementary tables SI and S2, 
Supplementary Material online) and according to the second- 
ary structures predicted for the introns used to obtain the 
consensus. Six introns with the 5^ end conforming to the GT/ 
C rule were excluded from the group of nonconventional 
introns and were classified as intermediate/nonconventional 
ones (l/N). 

Tubulin Genes with Nonconventional Introns— Active 
or Inactive Forms? 

As it was mentioned above, in some cases more than one 
form of gene was obtained suggesting the presence of intra- 
species polymorphisms or many forms of the gene in a single 
genome. In both cases, the question is whether all cloned 
forms of tubulin genes are active. This problem especially 
concerns genes in which nonconventional introns are ob- 
served — if these genes are inactive, nonconventional introns 
should be considered as undefined insertion elements, which 
are not spliced out, rather than as introns. To avoid including 
inactive forms of genes in the analysis, the genomic and cDNA 
sequences were compared (for details, see Materials and 
Methods); however, we cannot rule out the possibility that 
two copies of the gene — active form without introns and 
another, inactive one, with introns and/or insertion ele- 
ments — coexist in a single genome. On the other hand, 
each PGR amplification of genomic DNA generated individual 
products corresponding to the forms containing introns, 
whereas products corresponding to the intronless forms 



were not detected. This result suggests that nonconventional 
introns are indeed present in active forms of genes. 

Phylogenetic Analysis 

The trees obtained using the tubA and tubB sequences do not 
faithfully reflect the phylogeny of euglenids, because of the 
rather small amount of data related to a highly conserved 
protein (supplementary fig. S2, Supplementary Material 
online). Nevertheless, all well-supported relationships from 
those trees are consistent with a more reliable phylogenetic 
tree developed previously with the use of more sophisticated 
phylogenetic analyses of larger data sets (fig. 1 ). 

Toward a Model of Nonconventional Intron Structure 
To find common features of the nonconventional introns, the 
sequences of all known introns (74) were compared and a 
sequence logo for euglenid nonconventional intron junctions 
was created (fig. 2). Comparing the obtained sequence logo 
with the RNA secondary structure of nonconventional in- 
trons, two nucleotides at positions +4, +5 (5^ end of 
intron) and complementary nucleotides at positions —7, 
—6 (3^ end of intron) stand out as their most conserved 
feature; in most cases, CA/TG are present at these positions 
(a representative structure is shown in fig. 2). Other nucleo- 
tides involved in the maintenance of the stem-loop structure 
are less conserved or not conserved at all. Other conserved 
features observed for most nonconventional introns is the 
presence of a pyrimidine at the 3^ of the upstream exon 
and at the 3^ end of the intron, a purine at the 5^ end of 
the intron and at the 5^ end of the downstream exon, and a C 
at the third position of the downstream exon. Although 
nucleotides at positions + 2 and + 3 and —2, —3, —4, and 
—5 of the intron are not conserved, their ability to pair is 
preserved in some introns (e.g., intron in fig. 2). The conserved 
features mentioned above are in agreement with the obser- 
vation of Muchhal and Schwartzbach (1994) for nonconven- 
tional introns in LHCPII-coding genes of £ gracilis. 

Conventional Introns 

Conventional spliceosomal introns are unique to eukaryotes. 
Although the intron density varies among different evolution- 
ary lineages of eukaryotes, no representative of this group 
lacking spliceosomal introns are known. The lack of introns 
in the nucleomorph genome of Hemiselmis andersenii (Lane 
et al. 2007) is a special case — the nucleomorph located in the 
periplastid space of the chloroplast is just a remnant nucleus 
of a secondary endosymbiont. Due to their similar mecha- 
nism of excision, it is widely accepted that spliceosomal 
introns have evolved from self-splicing group II introns. It 
appears that the common ancestor of eukaryotes acquired 
them with the genome of an endosymbiotic a-proteobacter- 
ium, the progenitor of mitochondria. However, there are also 
opinions indicating that group II introns and eukaryotic 
spliceosomes only share a common ancestor, namely the 
proto-spliceosome, which evolved in the RNA world as a 
mechanism to excise functional RNAs from the ancient 
RNA genomes (Vesteg et al. 2012). 
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Fig. 2. Model of nonconventional intron junctions. (A) Intron 9 in tubAl gene from Euglena gracilis — an example of nonconventional intron secondary 
structure; exons in gray, intron in black; the most conserved nucleotides are boxed. Sequence logo of 5^ (B) and 3' (C) exon/intron junctions (splice sites 
indicated by vertical lines) created from 74 nonconventional intron sequences (36 introns used in this study, introns in tubA, tubB, rbcS, Ihcpl, gapC, 
nopip, psbj, psbO, and psbW genes from £. gracilis, and two introns in hsp90 gene from P. trichophorum). 



When considering the evolution of spliceosomal introns, 
one has to mention a long-lasting debate between supporters 
of the "intron-early" and "intron-late" hypotheses. The first 
hypothesis assumed that introns were present in the prokary- 
otic ancestor of eukaryotes, and the present differences in 
their occurrence in different evolutionary lineages are due 
to their independent loss. According to the latter theory, 
spliceosomal introns, being a typically eukaryotic "invention," 
were inserted into originally intron-free genes. At present, a 
synthetic theory of the evolution of spliceosomal introns, 
combining both hypotheses, is widely accepted. It can be 
summarized as "many introns early in eukaryotic evolution" 
and has a strong support in a wide range of data (for review, 
see Rogozin et al. 2012). 

Spliceosomal introns are common in tubulin-coding genes 
throughout eukaryotes; some of them are unique to individ- 
ual groups, whereas others occur in the same positions in 
distantly related species (Perumal et al. 2005), which is con- 
sistent with the above synthetic concept. A similar situation is 
also observed for the conventional spliceosomal introns in the 
euglenid genes coding for a- and (3-tubulins: some occur in 
positions shared by distantly related species (introns 3 and 4 
in tubA), while the positions of others are unique. 
Unfortunately, in other representatives of Excavata examined 
to date, the tubA and tubB genes are intron less and it is 
difficult to reconstruct the pattern of intron distribution in 



the putative common ancestor of all the taxa analyzed in this 
study. It is much easier to analyze the distribution of conven- 
tional introns exclusively in photosynthetic euglenids — the 
most parsimonious hypothesis is that the common ancestor 
of all phototrophic taxa had introns in positions 1 and 3 in the 
tubB gene and at least one intron (position 3) in tubA. The 
common ancestor of Euglenales had two more introns in the 
tubA gene — in positions 4 and 10. Based on an analysis of 
intron distribution in the photosynthetic euglenids, 11 events 
of intron loss are predicted: in the tubA gene, introns 3 and 4 
were lost in £ quartana, intron 3 in the common ancestor of 
Euglenaria anabaena and A/1, pyrum, intron 4 in A/1, pyrum, 
intron 10 in the ancestor of Trachelomonas similis and 
S. costata, intron 4 in Discoplastis spathirhyncha, and intron 
3 in the common ancestor of Eutreptiales; in the tubB gene, 
introns 1 and 3 were lost independently in Eutreptia vir'idis 
and in Eut. pomquetensis (see fig. 1). On the other hand, there 
is no clear-cut evidence indicating a conventional intron gain 
in the tubA or tubB genes of photosynthetic euglenids. 
However, such an event could have taken place in their 
common ancestor — the distribution of introns differed sub- 
stantially between hetero- and phototrophs. 

Nonconventional Introns 

Only limited data about nonconventional introns of eugle- 
nids are available. It is not known how they are removed or 
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what their origin is; consequently, the relationship between 
conventional spliceosomal and nonconventional introns 
remains a mystery. Considering the lack of sequence conser- 
vation of intron junctions and the weakness or absence of 
base pairing between the 5^ junction and U1 snRNA, a spli- 
ceosome-free mechanism of their excision seems the most 
likely. One is tempted to postulate the existence of an endo- 
nuclease, possibly similar to the endonucleases taking part in 
the splicing of tRNA introns in archaea and eukaryotes (for 
review, see Calvin and Li 2008), where a conserved secondary 
structure of the pre-tRNA is recognized and intron removed, 
followed by exons joining by a tRNA ligase. A similar mech- 
anism is responsible for excising introns from mRNA and 
rRNA in archaea (for review, see Calvin and Li 2008). A case 
of nonconventional intron removal from pre-mRNA by en- 
donuclease not involved in tRNA splicing is also known — in 
yeast, mammals, and plants, the endonuclease Irel removes 
an intron from mRNA encoding a transcription factor 
involved in the unfolded protein response (Gonzalez et al. 
1999; Yoshida et al. 2001; Nagashima et al. 2011); however, no 
ortholog of this endonuclease has been found in the E. gracilis 
EST database (Russell et al. 2005). 

The nonconventional introns of the euglenid tubulin genes 
show a distribution pattern different to that of the conven- 
tional ones. The nonconventional introns are present in five 
positions in the tubA gene and in nine in tubB; none of those 
14 positions is common to all representatives of Euglenales, 
and 7 are unique to single species (fig. 1). This distribution 
pattern suggests that they are relatively recent and the 
process of nonconventional intron gain is much more fre- 
quent than in the case of conventional ones. Our conclusion 
is consistent with the original suggestion of Canaday et al. 
(2001), who reported the presence of nonconventional 
introns in unique positions in E. gracilis tubulin genes for 
the first time. An apparent recent gain of numerous introns 
is not limited in Euglenales to nuclear genes — in their chlo- 
roplast genomes, a massive gain of self-splicing introns is also 
observed (Pombert et al. 2012; Wiegert et al. 2013). Such a 
widespread occurrence of intron gain suggests the action of 
undefined evolutionary pressure promoting the presence of 
intervening sequences in various genomes or a common 
mechanism of their spreading in different genomes. There 
are no common features between the nonconventional 
nuclear and the self-splicing organellar introns; however, the 
secondary structures of group II and III introns dominating in 
the euglenid chloroplast genomes sometimes differ from the 
conserved model (Hrda et al. 2012). Possible relationships 
between these two intron types deserve further investigation. 

Do Intermediate Introns Really Exist? 
Intermediate introns were originally defined by Canaday et al. 
(2001), who noted that some introns in E. gracilis tubA and 
tubB genes (intron 5 in tubA and 9 in tubB, according to the 
numbering in fig. 1) combine features of both conventional 
(base paring with U1 snRNA, presence of both consensus 
intron borders or at least the 5^ consensus GT/C) and 
nonconventional introns (stable RNA secondary structure). 



Another intermediate intron was defined in E. gracilis fibril- 
larin gene (junctions AG | gate . . . ggag | GA, secondary struc- 
ture present; Russell et al. 2005). The idea of distinguishing 
that type of introns as a potential transitional form between 
conventional and nonconventional introns seems attractive 
(Russell et al. 2005), although the direction of such a transition 
is unclear. In this study, intermediate introns were subdivided 
into two groups: C/l and l/N. The former group includes cases 
where junctions typical for conventional introns can be found 
at any of possible intron positions (more than one possible 
position is the effect of direct repeats at the junctions) and a 
stable secondary structure is observed. Three such introns 
were found: intron 4 in Pha. limnophila tubA, intron 5 in E. 
garcilis tubA, and intron 1 in E. agilis tubB. Intron 4 in the 
Pha. limnophila tubA gene and intron 1 in the E. agilis tubB 
gene are in positions shared with conventional introns — the 
question then is whether their secondary structures arose 
accidentally in the ancestral conventional introns or whether 
they reflect kinship with nonconventional introns? Stable sec- 
ondary structures can be formed by many cis- and trans- 
spliced conventional introns (Chen and Stephan 2003; van 
der Burgt et al. 2012, Roy et al. 2012) and could be responsible 
for bringing together the two intron ends before the excision 
by spliceosome. The nucleotide sequences and the predicted 
secondary structures of the Pha. limnophila and £ agilis C/l 
introns do not fit well the model for nonconventional intron 
junctions (fig. 2), which could mean that they are in fact 
conventional ones. However, these introns should also be 
considered as a potential transitional form between conven- 
tional and nonconventional introns, because a mutation dis- 
turbing canonical GT-AG ends could lead to a change in their 
excision mechanism. 

The third intron defined as C/l occurs in position 5 in the 
E. garcilis tubA gene — four alternative positions are possible 
for it, the last one giving GC-AG intron ends (fig. 3). In con- 
trast to the Pha. limnophila and E. agilis C/l introns, no con- 
ventional introns are present at this position in other species, 
but one nucleotide upstream a nonconventional intron 5 is 
present in E. quartana, with an exactly defined position. 
When the E. gracilis intron is assumed to share position 
with the E. quartana one, its ends no longer conform to 
the classical intron rule but instead fit the model for noncon- 
ventional intron junctions (fig. 3). Thus, the question of the 
mechanism of intron 5 removal in E. gracilis remains open. 
The shared position with the E. quartana intron, the similarity 
to the model for nonconventional intron junctions, and a lack 
of a typical polypyrimidine track upstream of the 3^ end of the 
intron suggest that this intron is most likely of the noncon- 
ventional type. The fact that no other newly acquired con- 
ventional introns can be found in Euglenales also supports 
this conclusion. By chance, the direct repeats present at this 
intron's borders would also allow its excision by the conven- 
tional spliceosomal mechanism at a position one nucleotide 
downstream from the original one with preservation of the 
mRNA coding potential. 

Six introns were defined as l/N (only 5^ end GT/C con- 
served); four of them occur in positions shared with noncon- 
ventional introns (8, 9, and 12 in tubB), whereas two are in 



589 



AAilanowski et al. • doi:10.1093/nnolbev/msc227 



MBE 



EugGral TCCA G | gcca|tg 
EugQua TCCA | a ac 



ca 



ccgttttc t . . . t gaaa ac gcggc 
gactgcccttt. . . gagaagggcagcc 



tga 



a|t^cttca glACCAA 

caaclG ACCAA 



Fig. 3. Comparison of intron 5 sequences in tubA genes from Euglena gracilis and £ quartana. Four positions are possible for the £ gracilis intron 
because of the presence of direct repeats (shaded); in the fourth position the intron has GC-AG ends (in bold). If the £ gracilis intron is defined to 
coincide with the £ quartana intron position, the most conserved nucleotides in the secondary structure of nonconventional introns CA-TG (boxed) 
are present in conserved positions + 4, + 5 (5' end of intron) and —7, —6 (3' end of intron). Nucleotides involved in forming the intron secondary 
structure are underlined; exon sequences are shown in upper case and intron sequences in lower case. 



positions unique to single species — intron 2 in S. costata tubA 
and intron 11 in the T. s/m/7/s tubB gene. The l/N intron in the 
S. costata tubB gene is in the same position as nonconven- 
tional intron in L spimgymides (position 12); these species are 
not closely related and it is unclear whether these introns 
were gained independently or are derived from a common 
ancestor (both scenarios are shown in fig. 1). The first scenario 
needs two intron gains, whereas the second one needs a 
single intron gain and six independent intron losses. The 
position of intron 12 in S. costata is unequivocal, whereas in 
L spimgymides as many as nine positions are possible. 
However, the position shared with S. costata fits the best 
the model for nonconventional intron junctions. The funda- 
mental question regarding the l/N introns is whether they 
really are intermediate introns or are simply nonconventional 
ones with the 5^ GC/T ends formed by random mutations. 
According to the sequence logo for the 5^ ends of noncon- 
ventional introns, most of them have a purine at the first 
position, while the second one is not conserved (fig. 2). 
Therefore, the probability of a GT/C end is quite high. 
A different situation is observed at the 3^ end of nonconven- 
tional introns, where a pyrimidine is preferred at the last po- 
sition and probability of an AG end seems low; in fact, not a 
single intron in the a- and (3-tubulin gene can be found with 
AG at the 3^ end and a nonconventional 5^ end. It therefore 
seems likely that the 5^ GT/C ends of l/N introns arose by 
mutations preserving the RN end (a puRine followed by aNy 
nucleotide) typical of nonconventional introns, and they are 
still excised by mechanism specific for nonconventional in- 
trons. It should be emphasized, however, that nonconven- 
tional introns with the GT/C 5^ ends are more susceptible to 
becoming transformed into conventional ones. Crucial is the 
transversion of the last pyrimidine at the 3^ end of noncon- 
ventional intron to G (pyrimidine is preferred at the 3^ end of 
nonconventional introns). Such a situation is observed at the 
3^ end of an intermediate (l/N according to the nomenclature 
used in this study) intron in the E. gmcilis fibrillarin gene 
(Russell et al. 2005), with GA at the 5' and AG at the 3' 
end; it can also be folded into a stable secondary structure 
fitting the model for nonconventional intron junctions very 
well. This intron seems to be a better example of the l/N type 
than are introns in the tubulin genes, which are possibly reg- 
ular nonconventional introns. 

The Origin of Nonconventional Introns 
As discussed above, a change of the mechanism of intron 
excision through mutational changes of the intron sequence 
is plausible. However, when one takes into account the 



different patterns of distribution of conventional and 
nonconventional introns, it seems unlikely that such a 
change is the only mechanism of nonconventional intron 
gain. On the contrary, the 15 events of nonconventional 
intron gains versus none for conventional intron in 
Euglenales suggest an independent origin of nonconventional 
introns. It is widely accepted that conventional spliceosomal 
introns evolved from group II self-splicing introns in the 
common ancestor of all eukaryotes. What is more, at least 
seven mechanisms of spliceosomal intron gain in new posi- 
tions have also been proposed (for review, see Yenerall and 
Zhou 2012). In contrast, for the nonconventional introns, 
both the time and the mechanism of their acquisition in 
the evolutionary past remain a mystery. Whether any of 
the mechanisms proposed to explain the gain of conventional 
introns also functions in the case of nonconventional intron 
gain or whether the latter is unique to euglenids is unknown; 
it cannot be excluded that mechanism of nonconventional 
intron gain is closely linked to the mechanism of intron 
removal. In this context, one of the mechanisms of conven- 
tional intron gain — transposon insertion — deserves particular 
attention (Giroux et al. 1994; Roy 2004). To date, no trans- 
posable elements have been identified in euglenid genomes, 
but it does not necessarily mean that they are absent — those 
genomes have only been explored rather cursorily. The acqui- 
sition of stable secondary structure by nonconventional 
intron RNA is made possible by the presence of inverted 
repeats at the intron ends. Inverted repeats are also charac- 
teristic for transposon ends — including miniature inverted- 
repeat transposable elements (MITEs) (Feschotte et al. 2002; 
Casacuberta and Santiago 2003; Jiang et al. 2004; Lu et al. 
2012). MITEs are short (usually less then 1 kb), AT-rich struc- 
tures which do not encode proteins; they are flanked by short 
(about 15 bp), often imperfect, inverted repeats that enable 
them to form stem-loop RNA structures. They have been 
found in numerous eukaryotic genomes, including plants, 
animals, and human; their location is often conserved 
between related taxa (Wessler 1998). MITEs are nonautono- 
mous and require for mobilization enzymes encoded by 
other transposons from which they probably originated 
(Jiang et al. 2004). Despite being widespread in eukaryotic 
genomes, mechanism of their transposition remains unclear. 
Interestingly, they are preferentially located in euchromatin — 
within gene promoters, terminators, and introns or even 
coding sequences. The origin of nonconventional introns 
from MITE-like elements should be considered as a plausible 
hypothesis. Inserted in new exonic genome locations, MITE- 
like structures could become intervening sequences excised 
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by an unknown (transposition-related?) mechanism from 
pre-mRNA. 

Conclusions and Perspectives 

The analysis of tubA and tubB gene sequences from photo- 
synthetic euglenids reveals that nonconventional intron gains 
were very common over their evolutionary history, in contrast 
to the behavior of conventional introns that seem only to 
have been lost in some lineages. It also suggests that the origin 
of nonconventional introns must be different than the ori- 
gin of conventional ones. Insertion of MITE-like elements is 
proposed as a possible source of nonconventional introns. 
It seems likely that nonconventional introns can also arise 
as a consequence of mutational changes in conventional 
intron sequences — an intron with signals recognized by two 
types of splicing mechanism would be resistant to mutations 
affecting only one of the signals. However, an alternative sce- 
nario is also possible — conventional introns could arise as a 
consequence of mutations creating canonical GT-AG splice 
sites at the ends of nonconventional introns. The so-called 
intermediate introns would play an important role in both 
scenarios. 

We hope that our highlighting of the common features 
of nonconventional introns will aid future studies of the 
mechanism of their removal and deeper analyses of euglenid 
genomes. Such analyses will be especially helpful in verifying 
the hypothesis presented here and establishing the moment 
in euglenids evolution when nonconventional introns 
appeared. 

Materials and Methods 

Strains and Culture Conditions 

Euglenid strains whose sequences were used in this study are 
described in supplementary table S3, Supplementary Material 
online. All strains were cultivated under identical conditions 
in a liquid soil-water medium enriched with a small piece of 
garden pea (medium 3c, Schlosser 1994), in a growth chamber 
maintained at 17 °C and a 16:8 h light/dark cycle, ca. 27 |imol 
photons-m~^-s~^ provided by cool white fluorescent tubes 
(Philips). 

RNA Isolation, cDNA Synthesis, Amplification, 
Cloning, and Sequencing 

Total genomic RNA was isolated with the RNeasy Kit 
(Qiagen) using the animal tissues protocol (incubation 
with proteinase K instead of mechanical disruption was 
performed). RNA was treated with DNase I (Qiagen) to 
eliminate DNA contamination according to the manufac- 
turer's instructions. First-strand cDNA was synthesized from 
total RNA using Superscript III Reverse Transcriptase 
(Invitrogen) and a 17 bp oligo-dT primer. Products of the 
synthesis were used as a template in a PCR reaction with 
the pair of degenerated primers FO/RO (sequences of all 
primers used in this study are listed in supplementary 
table S4, Supplementary Material online). A 25|il reaction 
mixture contained 0.5 U of Phusion High-Fidelity DNA 
Polymerase (Finnzymes), 0.2 mM dNTPs, 1.5 mM MgClj, 



lOpmol each primer, reaction buffer HR (Finnzymes), and 
1 |il of the first strand cDNA (diluted lOx). The PCR pro- 
tocol consisted of an initial 30 s at 98 °C, followed by 35 
cycles comprising 10 s at 98 °C, 15 s at 54 °C (tubAFO/RO) or 
53 °C (tubBFO/RO), and 30 s at 72 °C. The final extension 
step was performed for 5min at 72 °C. PCR products were 
sized on agarose gels, purified using the QIAEXII Gel 
Extraction Kit (Qiagen), sequenced directly from both 
strands by cycle sequencing using BigDye Terminator 
Cycle Sequencing Ready Reaction Kit 3.1 (Life 
Technologies), and then analyzed on an ABI 3730 Genetic 
Analyser (Applied Biosystems). In some cases, PCR products 
were cloned into pGEM-T easy vector (Promega) after ad- 
dition of adenosine at their 3^ ends using Taq Polymerase 
(Qiagen). For each euglenid strain, several clones were 
chosen and plasmids were isolated; then cycle sequencing 
of inserts was performed using BigDye Terminator Cycle 
Sequencing Ready Reaction Kit 3.1 (Life Technologies); uni- 
versal vector primers M13F and M13R were used in the 
sequencing. 

DNA Isolation, Amplification, Cloning, and 
Sequencing 

Total genomic DNA was isolated from cultures with the 
DNeasy Tissue Kit (Qiagen) using the animal tissues protocol. 
DNA was treated with RNase A (Qiagen) to eliminate RNA 
contamination according to the manufacturer's instructions. 
Two-step nested amplification was performed to obtain a 
sufficient quantity of genomic PCR products. In the first re- 
action, the external degenerated primer pair FO/RO was used, 
followed by amplification with the internal degenerated 
primer pair F1/R1. For tubA amplification from A/1, pyrum 
and T. similis, specific primers FO/RO and F1/R1 were designed 
based on the cDNA sequences obtained earlier; for tubB am- 
plification from £ gymnodinioides, specific primers F1/R1 
were designed (see supplementary table S4, Supplementary 
Material online). In the first step, 25|il reaction mixture 
contained 0.5 U Phusion High-Fidelity DNA Polymerase 
(Finnzymes), 0.2 mM dNTPs, 3.5 mM MgCli, lOpmol of 
each primer, reaction buffer GC (Finnzymes), Q-solution 
(Qiagen), 1.2 |ig of Taq Single-Stranded DNA Binding 
Protein (EURx), and 10-50 ng of DNA. We found that addi- 
tion of the Taq SSB protein was crucial for reaction specificity 
and yield. The PCR protocol consisted of 2 min at 98 °C, fol- 
lowed by 7 initial cycles comprising 30 s at 98 °C, 30 s at 54 °C 
(tubAFO/RO) or 53 °C (tubBFO/RO), and 2.5 min at 72 °C, then 
by 35 cycles comprising 15 s at 98 °C, 15 s at 54/53 °C, and 
2.5 min at 72 °C. The final extension step was performed for 
5 min at 72 °C. In the second step, 25|il reaction mixture 
contained 0.5 U of Phusion High-Fidelity DNA Polymerase 
(Finnzymes), 0.2 mM dNTPs, 1.5 mM MgCli, lOpmol of 
each primer, reaction buffer GC (Finnzymes), Q-solution 
(Qiagen), 0.6 |ig of Taq Single-Stranded DNA Binding 
Protein (EURx), and 1 |il of undiluted mixture from the first 
step. The PCR protocol consisted of an initial 2 min at 98 °C, 
followed by 35 cycles comprising 15 s at 98 °C, 15 s at 54 °C 
(tubAFI/RI) or 63 °C (tubBFI/RI), and 2.5 min at 72 °C. PCR 
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products were sized on agarose gels, purified using the 
QIAEXII Gel Extraction Kit (Qiagen), and cloned into 
pGEM'T easy vector (Promega) after addition of adenosine 
at their 3^ ends using Taq Polymerase (Qiagen). For each 
euglenid strain, several clones were chosen and plasmids 
were isolated; then cycle sequencing of inserts was performed 
using BigDye Terminator Cycle Sequencing Ready Reaction 
Kit 3.1 (Life Technologies); universal vector primers M13F and 
M13R were used in the sequencing, as well as internal degen- 
erated primers F2, R2, F3, and R3 corresponding to regions in 
the middle of tubA and tubB genes. In the case of A/1, pyrum 
tubA and E gymnodinioides and 7. s/m/7/s tubB, additional 
internal sequencing primers were needed (see supplementary 
table S4, Supplementary Material online). Products of the se- 
quencing reactions were analyzed on an ABI 3730 Genetic 
Analyser (Applied Biosystems). 

Sequence Analysis 

Sequences were assembled into contigs by the SeqMan pro- 
gram from the Lasergene package (DnaStar). To minimize the 
risk to include in the analysis inactive forms of genes, the 
genomic and cDNA sequences (direct sequencing of PGR 
products) were compared. When the genomic sequence 
matched perfectly to the cDNA sequence, it was taken into 
account in further analysis. In cases where more than one 
form of the gene was obtained, if at least one of them 
matched perfectly to the cDNA, they were also included in 
the analysis. In a few cases, where the mismatch was found 
between genomic and cDNA sequences, the cDNA-derived 
PGR products were cloned, and then several cloned products 
were sequenced. If at least one of genomic forms matched 
perfectly to the one of cloned cDNA sequences, such forms 
were included in the analysis. The forms of genes, which did 
not match perfectly to the cDNA, were retained in the anal- 
ysis because they did not differ in the distribution of introns 
from forms whose expression was confirmed. To determine 
intron positions, the genomic and cDNA sequences were 
compared using the Mesquite program (Maddison WP and 
Maddison DR 2011). Prediction of the RNA secondary struc- 
ture of introns was performed using the RNAfold Webserver 
(http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi, last accessed 
December 5, 2013). Nucleotide logos of nonconventional 
intron junctions were created using Weblogo 2.8.2 (Grooks 
and Hon 2004; http://weblogo.berkeley.edu/logo.cgi, last 
accessed December 5, 2013). 

Phylogenetic Analysis 

After removing introns, tubA and tubB gene sequences were 
aligned separately using the Mesquite program (Maddison 
WP and Maddison DR 201 1 ); to the set of sequences obtained 
in this study, previously published genes from £ gracilis and 
Ent sulcatum were added. Maximum-likelihood analysis was 
performed with PhyML 3.0 (Guindon et al. 2010) using 
models for sequence evolution and their parameters esti- 
mated by JModeltest 2.1 (Posada 2008); for details, see 
legend to supplementary figure S2, Supplementary Material 
online. 



Supplementat7 Material 

Supplementary figures SI and S2 and tables S1-S4 are avail- 
able at Molecular Biology and Evolutior) online (http://www. 
m be.oxfordjournals.org/). 
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